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A new result of the scaling law of weighted ii 

minimization 

Jun Zhang, Urbashi Mitra, Kuan-Wen Huang and Nicolo Michelusi 


Abstract —This paper study recovery conditions of weighted 
minimization for signal reconstrnction from compressed sensing 
measurements. A sufficient condition for exact recovery by using 
the general weighted minimization is derived, which builds 
a direct relationship between the weights and the recoverability. 
Simulation results indicates that this sufficient condition provides 
a precise prediction of the scaling law for the weighted 
minimization. 

Index Terms —Compressive sensing, weighted £i minimization, 
scaling law, signal reconstruction. 


From convex analysis, we introduce the following Lemma. 

Lemma 1 (a) A vector x £ i?" is a global minimum of the 
model (O if and only !/3Wu £ i9||Wx|| smc/z that 

2A'^(Ax-y) + hWu = 0 (6) 

m 

(b) If\ui\ < Ifor all i ^ S and Ag is full rank, then x is the 
unique minimum and Xi = 0 for all i ^ S where S denotes 
the support set of vector x. 


1. Introduction 

To recover vector x* from measurement 

y = Ax* + Z (1) 

where A = [Ai, A 2 ,A„] £ is the i.i.d. Gaussian 

random matrix with rows A* ^ Af{0,a\I) (assume I is the 
n X n identity matrix) and Z ^ Af{0,a'^I) is i.i.d. Gaussian 
noise, we consider the following weighted ii minimization: 

X = argnun^ ||Ax- y ||2 + /i||Wx||^, h>0 (2) 

where W £ R^xn ^vjjose off-diagonal elements are zero and 
the diagonal elements 

£ (0, + 00 ) (3) 

Note that due to the present of noise, it is generally impossible 
to seek exact recovery of the sparse signal x*. Accordingly, 
this paper focuses on the goal that the optimum solution x 
and the true signal x* have their nonzero entries at the same 
locations and with same signs, i.e., sparsity pattern recovery 
or support recovery. 


The proof of this Lemma is given in Appendix A. 

Remark 1 Because measurement matrix A is Gaussian ran¬ 
dom matrix which has full rank with a probability of one, we 
suppose the condition Ag (or A.^} full rank is always satisfied 
throughout this paper. Note that for a matrix (or vector) M, 
we denote Ma the reduced dimensional matrix (or vector) 
built upon the columns (or entries) 0 /M whose indices are 
included in set A. 

Therefore, x is the unique minimum of model (|2]) if 
~ AgXg) = h ■ Wg • Ug 

—(y-As*s) 
m 

On the other hand, assume S is the support set of the 
true signal x* with cardinality \S\ — k n. Denote 
= [1, 2,..., n] \ S' where \ represents set difference. We 
can establish a sufficient condition under which the model (lU 
recovers its support exactly, i.e., sign(:E) = sign(a:*). 


< hwi for i ^ S 


II. Main Results 

At first, we denote the subdifferential of ||Wx||j^ as 

5||Wx||i = {Wulu^Wx = ||Wx||„ ||u||^ < 1} 

= {Wu|m^ = sign(a:i), if cci ^ 0 (4) 

and Ui £ [—1,1], otherwise} 

where 

( -fl, if Xi > 0 

sign(xi) = < -1, ifxi<0 (5) 

[ 0 , if = 0 
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Lemma 2 The support of signal x* can be recovered exactly 
from the solution of model (O, i.e., sign{x) = sign{x'^), 
provided the following events are satisfied 

1 ) ^{iI-AsA+)Z + mhAsiAlAs)~'WsUs} 

ml J 

<hw^, 

2) sign{x% + AjZ - mh{AsAs)~^WsUs) = sign(x*s) 

(8) 

where Aj = (AgAs)“^Ag is the pseudoinverse of Ag and 
Us = sign(x*g). 

The proof of this Lemma is given in Appendix A. 

Remark 2 The first condition in @ is a recovery guarantee 
for the zero entries in the signal x* from which we can 
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find that the recoverability of the zero entries in the signal 
X* obtained by solving the problem (|3 depends on the 
measurement matrix A, the weights W, the parameter h, the 
noise Z and the sign pattern of signal x*, but not on the 
magnitudes of its nonzero entries. Whereas, from the second 
condition in (0, the recoverability of the nonzero entries in 
the signal x* is related with its magnitudes besides the factors 
mentioned above. 

Next, precise conditions on the system parameters (m, n, k) 
can be obtained which are sufficient to guarantee the support 
recovery. We state the conclusion in the following Theorem. 


Theorem 1 For an k-sparse signal x* with k n, problem 
m is solved to recover its support S from linear measurement 
y = Ax* + Z. Define the gap 


A 


g{h) = C3/i||WsMs|1^ +6^ 


•log(fc) 


mcT^ 


(9) 


If & S \x*\ > g{h) holds, and if for some fixed e > 0, 
triple (m, n, k) and regularization parameter h obeys 


m > 2rik\og{n — A:)(l + e ) ( 1 + 


2 2 
h'^k 


( 10 ) 


where g = max 
i^S'- 




E 

with ^ 


and Ws,i repre¬ 


sents the i-th diagonal element in the matrix Ws, then the 
solution X of problem m, with probability greater than 1- 
Ciexp(—C 2 min{k, login — k)}) for some positive constants 
Cl and C 2 , recovers the support of signal x* exactly, i.e., 
sign{x) = sign{x*). 


The proof of this Theorem is given in Appendix B. 


Remark 3 Theorem Q] indicates that if m > 2gklog(n — k) 
holds and the nonzero entries of x* are large enough, model 
m can, with high probability, recover the support of signal x* 
exactly where the important parameters g are directly related 
to the model weights. In real applications, we can significantly 
reduce the sample requirement for support recovery through 
optimizing the weights in model ^ so as to achieve the g as 
small as possible. 


Remark 4 A result similar to the one in can be shown, 
if we set 


h = 


2(j)ngcr%(j\ log(n - k) 


( 11 ) 


for some fin > 2, then it suffices to have m > 2gk log(n — 
k) ^(1 + e for some e > 0. Moreover, if we 

choose an h with fin -A +oo, then Theorem Q] guarantees 
the support recovery of x* with about m = 2ryfclog(n — k) 
samples. 


Remark 5 A special case of weighted Ii minimization model 
is the Modified-CS which weights the partial known 
support as zero. According to condition m, we can find that if 
the prior support information is accuracy, this weight strategy 



(a) 



(b) 


Figure 1. Simulation results of model m with tj = 1. (a) The probabilities 
of support recovery versus the sample size m for three different problem size 
n, in all cases with sparsity k = [0.4n^'®]. (b) The probabilities of support 
recovery versus the rescaled sample size 8(m, n, k) = m/[2klog{n — k)]. 


ensures that g < 1 holds. Comparing with the classical 
result m > 2klog{n — k) /[T^ required by the BPDN where 
g = 1, Modified-CS achieves a reduced sample requirement 
by exploiting prior support information. 

III. Simulation Results 

In this section, some simulations have been conducted 
to validate the scaling law built in Theorem [T] In our ex¬ 
periments, the nonzero element of fc-sparse signals is ±1 
uniformly at random. Measurement matrix A G is 

drawn randomly from the standard Guassian distribution, i.e., 
Aij ^ i.i.d and noise Z ^ A/'(0,(t|I) with 

az = 0.5. Based on Remark |4] the choice of h follows 
equation (fTTl l with = 9 in our experiments. At first, the 
standard BP model is employed to recover the support of the 
fc-sparse signals x*. According to Theorem [T] the standard 
BP model, as a special case of the weighted £i minimization 
model, has p = 1. In Fig. |I(a)| we plot the probabilities of 
support recovery versus the sample size m for three different 
problem sizes n G {512,1024,2048}, and k = [0.4n°-®] 
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in each case. We repeat each experiment 200 times at each 
point. Obviously, the probabilities of support recovery vary 
from zero to one along with the samples increase and the 
larger problem requires more samples. However, according to 
the scaling predicted by Theorem [T] i.e., 

2 2 

m > 2rik\og{n — k){l + e)(l + = ‘2-r]C,k\og{n — k) 

( 12 ) 

where C is a constant. Thus, Fig. |l(b)| plots the same exper¬ 
imental results but the probabilities of support recovery are 
now plotted versus an “appropriately rescaled” version of the 
sample size, i.e., d{m,n,k) = m/[2klog{n—k)]. In Fig. [T(b)l 
all of the curves now line up with one another, even though the 
problem sizes and sparsity levels vary dramatically. And all of 
the cases obtain the probabilities of support recovery are equal 
to one at 0(m,n, fc) =( Ki 2. Obviously, the experimental 
result matches the theoretical prediction in Theorem [T] very 
well. Note that similar simulation was carried out in m to 
confirm the scaling law of standard BP model. 

Further, the same experiments are performed but we used 
the weighted ii minimization model where the weights aren’t 
equal to one to recover the support of the fc-sparse signal x*. 
Two classes of weights are tested where one weights nonzero 
element of fc-sparse signals with Wi — v^/2 and another is 
Wi = 1/2. The experimental results are plotted in Fig. |2]and 
Fig. [3l respectively. According to Theorem [T] the weighted 
£i minimization model have rj — 0.5 and p = 0.25 with 
respective to the two classes of weights respectively. As shown 
in Fig. in and Fig. [3 the curves obtain the probabilities of 
support recovery are equal to one at 9{m,n,k) = ~ 1 

and 9{m, n, fc) = |:C ~ 0.5, respectively. Obviously, all of the 
simulation results match the theoretical predictions in Theorem 
[T]very well, which indicates that Theorem[T]provides a precise 
prediction of the scaling law for the weighted £x minimization. 



(a) 



(b) 


Figure 2. Simulation results of model with 7] = 0.5. (a) The probabilities 
of support recovery versus the sample size m for three different problem size 
n, in all cases with sparsity k = (b) The probabilities of support 

recovery versus the rescaled sample size 6(m, n, k) = m/[2klog{n — fc)]. 


APPENDIX A 


B. Proof of Lemma 2 

Proof: Define an n-dimensional vector as 


A. Proof of Lemma 1 

Proof: It is well known that the problem © can be 
transferred into an equivalent constrained problem that in¬ 
volves a continuous objective function over a compact set a. 
Therefore, its minimum is always achieved. Based on the first 
order optimality condition a, ® is a global minimum for 
the model a if and only if BWtt G ()||Wa;||j^,such that 
—A^(A;e — y) hWu = 0. Thereby Lemma 1(a) is 
established. 

According to the standard duality theory a. given the 
sub gradient W-fi, G ii”, any optimum x G i?" of model (|2]i 
must satisfy the complementary slackness condition u^Wx = 
||Wx||^. For all i such that |ui| < 1, this condition holds if 
and only if Xi = 0. Further, if |ui| < 1 for all i ^ S and 
Ag is full rank, then x can be determined uniquely from a. 
Therefore, Lemma 1(b) holds. 


r x^g = x*g + Z — mh{A'gAs) 

1 x\a = 0 


(13) 


where us = sign(a;g). If the conditions in ® are satisfied, 
we will prove that vector a;! is the unique minimum of model 

©. 

According to the second condition in ®, we have 


sign(a:s) = sign(a;s) (14) 

At the same time, utilizing the equality us = sign(a;g), it 
follows 


X% = x*s P A%Z - mh{A%As) x sign(at^) (15) 

Obviously, a;g satisfies the first condition in © with S 
replaced by S and replaced hy Xg. 
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(a) 


Proof: Based on Lemma |2] we conclude that model (|2l 
can recover the support of x* exactly, provided the events 
in (Ull are satisfied. Therefore, we firstly will derive a pre¬ 
cise condition under which event 1) in ® is satisfied with 
high probability. Further, by bounding the quantity (Ag Z — 
TO/i(Ag As)“^WsUs'), another condition can be obtained to 
guarantee sign(xs:) = sign(x*) holds with high probability. 
Then, according to Lemma|2] the support of signal x* is, with 
high probability, recovered exactly from the solution of model 
©. 

For the event 1) in ([Sll, conditioned on As and noise Z, we 
have that 



^[(7-AsA+)Z+m/iAs(A5As)-iWsMs] (17) 


is zero-mean Gaussian with variance at most 



var(ri|As, Z) 




AsiAlAs) 'Ws«s + (/-AsA+)A 


(18) 


Further, because 

L T 


As(A^As)-'Wsms, (I-AsA+) — 


= 0 (19) 


by applying the Pythagorean Theorem, it follows that 

var(r^|As, Z) 




As(AsAs) Wsus 




) 

( 20 ) 


For the first term in equation ( l20b . we have 


Figure 3. Simulation results of model 0 with rj = 0.25. (a) The probabilities 
of support recovery versus the sample size m for three different problem size 
n, in all cases with sparsity k = . (b) The probabilities of support 

recovery versus the rescaled sample size 0(m, n, k) = m/[2klog{n — k)]. 


Further, substituting (ffST l into the second condition in (|7]i, 
we have that \/i ^ S 

AT 


AsiAlAs) Wsus 


= -U^s^s 
m 


AlAs 


-(y-Asxl) 


< -llrxiWsI 

m " ' 

< - IIm^Ws 

m 


AeA 


WsUs 

-1 


S-^S 

m 


WsUs 


AiAs 


-1 


( 21 ) 


A 

m 
< hwi 


-L- 1(7 - AsA^)Z + mhAs{AgAs) ^Ws x sign(x^)^iere the first inequality follows from the Cauchy-Schwartz 
^ inequality, IH-HL represents the spectral norm and the second 


(16) 


where the inequality in (fThl l utilizes the fact that sign(xg) = 
Us and follows from the first condition in ©. Hence, Ac¬ 
cording to the sufficient conditions in ©, is the unique 
minimum of model (|2]), i.e., x = xT Based on (foT l and (fT4ll . 
sign(x) = sign(x*) holds. 


APPENDIX B 

Proof of Theorem 1 In this section, the proof of Theorem 1 
uses the techniques from HI, with appropriate modification to 
account for the weighted ii norm that replaces the norm. 


inequality follows from the definition of matrix norm. 
At the same time, we have 


AlA, 


< 






-1 


- (4^)’ 


( 22 ) 


Applying Lemma 9 in m, it follows that event 


AlAs 


1 


( 23 ) 


is satisfied with probability greater than 1 — 2exp(—fc/2). 
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Recall that the definition of vector Mg. We have 




(24) 


i=l 


where ^ = 


E 


and Ws,i represents the i-th diagonal 


element in the matrix Ws. Consequently, combining equations 
(|2T1 i. ( | 2 ^ and (l24l i. we obtain that event 


AsiAlAs) Wsus 


< l + 8\ - 




m / ma, 


(25) 


is satisfied with probability greater than 1 — 2 exp(—fc/ 2 ). 
Turning to the second term in (l20l i. we have 




< 


1 


mh'^ m 


(26) 


since (J — AgAg) is an orthogonal projection matrix. On 
the other hand, /(t| is a variate with m degrees of 
freedom. Thus, applying the tail bounds for variate (see 
Appendix J in 111), we have that for all e € (0,1/2), 


(J-AsA+) 


mh 


>a + e) 


mK^ 


< exp I — 


3me^ 


16 


Combining (l20l) . (|23]l and dZTl l. we have that event 

var(ri) 


(27) 




1 + max < e, 8 


/ efc 


\ma\ mh? J 


(28) 


is satisfied with probability less than 4exp(—Cimin{me^, fc}) 
for some ci > 0 . 

Consequently, applying the standard Gaussian tail bounds 
(see Appendix A in CD, we have 


max Ird > Wi 


maxlTil > lUilvariri) < If 


maxlrJ > wdvarirO > If 

ieS‘ 


< 


maxlTil > uiilvarlTi) < If 


var(ri) < If] 
P[var(ri) > If] 

+ P[var(ri) > n] 


wt 


< 2{n — fc)exp [ ) + 4exp(—cimin{me^, k}) 


(29) 


In high dimensional case, we can assume that when m is 
sufficiently large, inequality 8 e holds for any fix e > 0 

ifTl . Hence, the exponential term in ( l29b is decaying, provided 

2 2 

TO > 27?fclog(n - fc)(l + e )(1 + (30) 

where p = max{;j^} and e satisfies log(n — A:)(l + e ) = 
(l + log(n-A:))(l + e). 


For the event 2) in ([ 8 ]l, we establish a bound on 
ll^s — ajgll . According to ( fT3l l and applying triangle in¬ 
equality, we have 


- 1 , 


\xs-x*s\\^<mh (AgAs) WsUg 


|A^Z| 


(31) 

For the second term in OTT l. conditioned on As, random vector 
Ag Z is zero-mean Gaussian with variance at most 


TO 


{AlAs/my 


As analyzed in ifTI , 


tA 4 


(AgAs/m) 


— 


(32) 


< 2 exp(— to/ 2 ) 
(33) 


By the total probability rule, it follows 

P[||A^ZL>^] 




|A+Z|| >f|T<-^ 

' ^ mal 


nr > 14 ) 


Using the Gaussian tail bounds (see Appendix B), it follows 

< 4exp(— cito) (35) 


|A|Z| 


> 6 < 


/cr|log(fc) 


For the first term in OTI) . we have 


mh 
< h 


- 1 , 


(A^As) Wsus 


AgAs, 


-7-1 


Wsus 


|7-1Wsms| 


(36) 


According to Lemma 5 in m, we have 




m 


WsMs 


> Ci||WsMs|l^ 


< 4exp(—C 2 min{fc, log(n — /c)}) 


holds for some C 2 > 0. Therefore, it follows 

p/mfi (AiAs)''WsUs > C3fi||WsUs|l^| 

< C 3 exp(—C 2 minjfc, log(n — k)}) 


(37) 


(38) 


holds for some C 3 , Cg > 0 . 

Combining (|3T]) . iTSb and (|38] |. we have that event 

\\xs - a^slloo < C 3 ^||WsMs|l^ -f 6W = g{h) 

y mcT 

(39) 

is satisfied with probability greater than 1 — 

C 3 exp(—C 2 min{A:, log(n — fc)}). 

Therefore, if Vi G 5 \x*\ > g{h) holds, we have that for 
all i G S, sign(a;i) = sign(a;*) hold with high probability. 
Combining the probabilities that two events in ([ 8 ]) are satisfied, 
the conclusion in Theorem [T] holds. 
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